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Abstract: Analyzing principal components for multivariate data from its spatial sign 
covariance matrix (SCM) has been proposed as a computationally simple and robust al¬ 
ternative to normal PCA, but it suffers from poor efficiency properties and is actually 
inadmissible with respect to the maximum likelihood estimator. Here we use data depth- 
based spatial ranks in place of spatial signs to obtain the orthogonally equivariant Depth 
Covariance Matrix (DCM) and use its eigenvector estimates for PCA. We derive asymp¬ 
totic properties of the sample DCM and influence functions of its eigenvectors. The shapes 
of these influence functions indicate robustness of estimated principal components, and 
good efficiency properties compared to the SCM. Finite sample simulation studies show 
that principal components of the sample DCM are robust with respect to deviations from 
normality, as well as are more efficient than the SCM and its affine equivariant version, 
Tyler’s shape matrix. Through two real data examples, we also show the effectiveness of 
DCM-based PCA in analyzing high-dimensional data and outlier detection, and compare 
it with other methods of robust PCA. 

Keywords: Data depth; Principal components analysis; Robustness; Sign covariance 
matrix; Multivariate ranking 
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1 Introduction 


In multivariate analysis, the study of principal components is important since it provides 
a small number of uncorrelated variables from a potentially larger number of variables, 
so that these new components explain most of the underlying variability in the original 
data. In case of multivariate normal distribution, the sample covariance matrix provides 
the most asymptotically efficient estimates of eigenvectors/ principal components, but it 
is extremely sensitive to outliers as well as relaxations of the normality assumption. To 
address this issue, several robust estimators of the population covariance or correlation 
matrix have been proposed which can be used for Principal Components Analysis (PCA). 
They can be roughly put into these categories: robust, high breakdown point estimators 


that are computation-intensive (Maronna et al., 1976 Rousseeuw, 1985); M-estimators 
that are calculated by simple iterative algorithms but do not necessarily possess high 


breakdown point (Huber, 1977 Tyler 1987); and symmetrised estomators that are highly 
efficient and robust to deviations from normality, but sensitive to outliers and computa¬ 


tionally demanding (Diimbgen, 1998 Sirkia et al. ( 2007). 

When principal components are of interest, one can also estimate the population eigen¬ 
vectors by analyzing the spatial sign of a multivariate vector: the vector divided by its 
magnitude, instead of the original data. The covariance matrix of these sign vectors, 
namely Sign Covariance Matrix (SCM) has the same set of eigenvectors as the covariance 
matrix of the original population, thus the multivariate sign transformation yields com¬ 


putationally simple and high-breakdown estimates of principal components (Locantore 


et al., 1999 Visuri et al., 2000|). Although the SCM is not affine equivariant, its orthog¬ 


onal equivariance suffices for the purpose of PCA. However, the resulting estimates are 


not very efficient, and are in fact asymptotically inadmissible (Magyar and Tyler, 2014), 
in the sense that there is an estimator (Tyler’s M-estimate of scatter, to be precise) that 
has uniformly lower asymptotic risk than the SCM. 


The nonparametric concept of data-depth had first been proposed by Tukey (1975) 
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when he introduced the halfspace depth. Given a dataset, the depth of a given point in 
the sample space measures how far inside the data cloud the point exists. An overview of 


statistical depth functions can be found in (Zuo and Serfling 2000). Depth-based methods 


have recently been popular for robust nonparametric classification (Dutta and Ghosh 


2012; Ghosh and Chaudhuri 2005; Jornsten, 2004; Sguera et al., 2014). In parametric 


estimation, depth-weighted means (Zuo et al., 2004) and covariance matrices (Zuo and 


Cui 2005) provide high-breakdown point as well as efficient estimators, although they 


do involve choice of a suitable weight function and tuning parameters. In this paper we 
study the covariance matrix of the multivariate rank vector that is obtained from the 
data-depth of a point and its spatial sign, paying special attention to its eigenvectors. 
Specifically, we develop a robust version of principal components analysis for elliptically 
symmetric distributions based on the eigenvectors of this covariance matrix, and compare 
it with normal PCA and spherical PCA, i.e. PCA based on eigenvectors of the SCM. 

The chapter is arranged in the following fashion. Section [2] provides preliminary theo¬ 
retical concepts required for developments in the subsequent sections. Section [3] introduces 
the Depth Covariance Matrix (DCM) and states some basic results related to this. Section 
[5] provides asymptotic results regarding the sample DCM, calculated using data depths 
with respect to the empirical distribution function, as well as its eigenvectors and eigenval¬ 
ues. Section [5] focuses solely on principal component estimation using the sample DCM. 
We obtain influence functions and asymptotic efficiencies for eigenvectors of the DCM. 
We also compare their finite sample efficiencies for several multinormal and multivariate 
t-distributions with those of the SCM, Tyler’s scatter matrix and its depth-weighted ver¬ 
sion through a simulation study. Section [6] presents two applications of the methods we 
develop on real data. Finally, we wrap up our discussion in Section [7] by giving a summary 
of our findings and providing some potential future areas of research. Appendices [A] and 
[B] contain all technical details and proofs of the results we derive. 
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2 Preliminaries 


2.1 Spatial signs and sign covariance matrix 


Given a vector x £ 


its spatial sign is defined as the vector valued function (Mottonen 


and Oja 


1995 


S(x) 


x 11 x || 1 ifx/O 

< 

0 if x = 0 


When x is a random vector that follows an elliptic distribution |E| _1//2 /((x —p) T S _1 (x — 
/x)), with a mean vector /x and covariance matrix £, the sign vectors S(x — /x) reside 
on the surface of a p-dimensional unit ball centered at /x. Denote by £g(X) = PS(X — 
/x)S(X—/x) T the covariance matrix of spatial signs, or the Sign Covariance Matrix (SCM). 
The transformation x i—S(x — /x) keeps eigenvectors of population covariance matrix 
unchanged, and eigenvectors of the sample SCM Eg = Yll-i S(xj — /x)S(xj — /x) T /n are 
y^-consistent estimators of their population counterparts (Taskinen et al. 2012). 

The sign transformation is rotation equivariant, i.e. S(P(x — /x)) = P(x — /x)/1|P(x — 
/x) 1 1 = P(x — /x)/||x — /x|| = PS(x — /x) for any orthogonal matrix P, and as a result 
the SCM is rotation equivariant too, in the sense that Eg(PX) = PEg(X)P 7 . This is 
not necessarily true in general if P is replaced by any non-singular matrix. An affine 
equivariant version of the sample SCM is obtained as the solution E^ of the following 
equation: 


(Taskinen et al. 2012 


y (x-p)(x-/x) r 

T n ( x - m) t S t (X)- 1 (x - /x) 

which turns out to be Tyler’s M-estimator of scatter (Tyler, 1987). In this context, one 
should note that for scatter matrices, affine equivariance will mean any affine transfor¬ 
mation on the original random variable X H > X* = AX + b (A non-singular, b £ MP) 
being carried over to the covariance matrix estimate upto a scalar multiple: St(X*) = 
k.AYiT(K)A T for some k > 0. 
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2.2 Data depth and outlyingness 


For any multivariate distribution F = Fx belonging to a set of distributions F, the depth 
of a point x£R p , say D(x, Fx) is any real-valued function that provides a ’center outward 


ordering’ of x with respect to F (Zuo and Serfling, 2000). Liu (1990) outlines the desirable 
properties of a statistical depth function: 


(Dl) Affine invariance-. D(Ax + b, F^x+b) = F(x, Fx); 

(D2) Maximality at center: D(0,Fx) = sup xg]RP D(x, Fx) for Fx having center of sym¬ 
metry 6. This point is called the deepest point of the distribution.; 

(D3) Monotonicity with respect to deepest point: F>(x;Fx) < D{6 + a(x — 0),Fx), 0 
being deepest point of Fx-; 

(D4) Vanishing at infinity: F>(x;Fx) —> 0 as ||x|| —> oo. 

In (D2) the types of symmetry considered can be central symmetry, angular symmetry 
and halfspace symmetry. Also for multimodal probability distributions, i.e. distributions 
with multiple local maxima in their probability density functions, properties (D2) and 
(D3) are actually restrictive towards the formulation of a reasonable depth function that 
captures the shape of the data cloud. In our derivations that follow, we replace these two 
by a weaker condition: 


(D2*) Existence of a maximal point: The maximum depth over all distributions F and 
points x is bounded above, i.e. sup Fxg p sup xgRP D(x, Fx) < oo. We denote this point by 
Md(Fx)- 


A real-valued function measuring the outlyingness of a point with respect to the data 
cloud can be seen as the opposite of what data depth does. Indeed, such functions have 
been used to define several depth functions, for example simplicial depth, projection depth 
and Fp-depth. Keeping with the spirit of the utility of these functions we name them 
‘htped’: literally the reverse of ‘depth’, and give a general definition of such functions as 
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a transformation on any depth function: 

Definition 2.1. Given a random variable X following a probability distribution F, and a 
depth function D(.,.), we define Htped of a point x as: F>(x, F) = h{d x ) as any function 
of the data depth F(x, F) = d x so that h(d x ) is bounded, monotonically decreasing in d x 
and sup x l)(x, F) < oo. 

For a fixed depth function, there are several choices of a corresponding htped. We 
develop our theory assuming a general htped function, but for the plots and simulations, 
fix our htped as F>(x, F) = Mjj(F) — F(x, F), i.e. simply subtract the depth of a point 
from the maximum possible depth over all points in sample space. 

We will be using the following 3 measures of data-depth to obtain our DCMs and 
compare their performances: 


Halfspace depth (HD) (Tukey 1975) is defined as the minimum probability of all 
halfspaces containing a point. In our notations, 


HD(x,F)= inf P(u T X > u T x) 

ueRP;u^O 


Mahalanobis depth (MhD) (Liu et al. 1999) is based on the Mahalanobis distance 


of x to p with respect to E: ds(x, p) = y/(x — pFs- 1 (x — n) . It is defined 


as 


MhD(X,F)= TT 7j F—^ 

note here that ds(x, n) can be seen as a valid htped function of x with respect to 
F. 


Projection depth (PD) (Zuo, 2003) is another depth function based on an outly¬ 
ingness function. Here that function is 


0(x, F) 


sup 
u|| = l 


u T x — m( u 7 X)| 
s(u T X) 
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where m and s are some univariate measures location and scale, respectively. Given 
this the depth at x is defined as PD(x,F) = 1/(1 + 0(x, F)). 


Computation-wise, MhD is easy to calculate since the sample mean and covariance 
matrix are generally used as estimates of p, and X, respectively. However this makes MhD 
less robust with respect to outliers. PD is generally approximated by taking maximum 
over a number of random projections. There have been several approaches for calculating 
HD. A recent unpublished paper ( Rainer and Mozharovskyi| 2014) provides a general 
algorithm that computes exact HD in 0(n p_1 log n) time. In this paper, we shall use 
inbuilt functions in the R package fda.use for calculating the above depth functions. 


3 Depth-based rank covariance matrix 


Consider a vector-valued random variable X G M p . Data depth is as much a prop¬ 
erty of the random variable as it is of the underlying distribution, so for ease of nota¬ 
tion while working with transformed random variables, from now on we shall be using 
D x (x) = D(x, F) to denote the depth of a point x. Now, given a depth function Dx(x) 
(equivalently, an htped function Dx(x) = D(x,F)), transform the original random vari¬ 
able as: x = Z?x(x)S(x — //), S(.) being the spatial sign functional. The transformed 


random variable X can be seen as the multivariate rank corresponding to X (e.g. Ser 


fling (2006)). The notion of multivariate ranks goes back to Puri and Sen (1971), where 


they take the vector consisting of marginal univariate ranks as multivariate rank vector. 


Subsequent definitions of multivariate ranks were proposed by Hallin and Paindaveine 


(2002); Mottonen and Oja (1995) and Chernozhukov et al. (2014). Compared to these 


formulations, our definition of multivariate ranks works for any general depth function, 
and provides an intuitive extension to any spatial sign-based methodology. 

Figure [l] gives an idea of how the multivariate rank vector X is distributed when X 
has a bivariate normal distribution. Compared to the spatial sign, which are distributed 
on the surface of p-dimensional unit ball centered at /x, these spatial ranks have the same 
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Figure 1: (Left) 1000 points randomly drawn from A /2 ((0, 0) T , ( _i 5 4 ~ 4 )) and (Right) their 
multivariate ranks based on halfspace depth 

direction as original data and reside inside the p-dimensional ball around /1 that has 
radius Md(F) (which, for the case of halfspace depth, equals 0.5). Any outlying samples 
situated far away from the data cloud (represented by red points in the figure) are mapped 
close to the boundary of the p-dimensional ball after the rank transformation. 

Now consider the spectral decomposition for the covariance matrix of F: S = rAr T , 
r being orthogonal and A diagonal with positive diagonal elements. Also normalize the 
original random variable as z = r T A _1 / 2 (x — fi). In this setup, we can represent the 
transformed random variable as 

x = Z) x (x)S(X - fi) 

= -^ > rA 1 /2 Z + f i( r A 1/2z + /*)-S(rA 1/2 z) 

= ri) z (z)S(A 1/2 z) 

= FA^CzWSfz) (!) 

£>z{ z) is an even function in z because of affine invariance, as is ||z||/||A 1//2 z||. Since 
S(z) is odd in z for circularly symmetric z, it follows that F7(X) = 0 , and consequently 
we obtain an expression for the covariance matrix of X: 



-0.4 0.0 0.2 0.4 

Xrank[1:1000,1[,1] 
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Theorem 3.1. Let the random variable X 6 M p follow an elliptical distribution with 
center pi and covariance matrix £ = TAr 7 , its spectral decomposition. Then, given a 
depth function -Dx(-) the covariance matrix of the transformed random variable X is 


CovfX.) = TA D)S T t , with A d,s = E z 


Pz(z)) : 


, A 1 / 2 zz' r A 1 / 2 
z T Az 


( 2 ) 


where z = (zi, ...,z p ) T ~ X(0,/ p ), so that Ad,s a diagonal matrix with diagonal entries 


Ad,s,j = Ez 


(l)z(z)) 2 A i z, 




The matrix of eigenvectors of the covariance matrix, T, remains unchanged in the 
transformation X —» X. As a result, the multivariate rank vectors can be used for robust 
principal component analysis, which will be outlined in the following sections. However, 
as one can see in the above expression, the diagonal entries of Ad s are not same as those 
of A, i.e. the actual eigenvalues. This is the reason for lack of affine equavariance of the 


DCM. Following the case of multivariate sign covariance matrices (Taskinen et al. 2012) 


one can get back the shape components, i.e. original standardized eigenvalues A* from 
Ad,s by an iterative algorithm: 

1. Set k = 0, and start with an initial value A*(°). 

2. Calculate the next iterate 


A*( fc +i) = 


and standardize its eigenvalues: 


E, 


(T>z(z)) 2 zz 

z r A*( fc )i 


T 


A*( fc +1) _ 


A*( fc+1 ) 


Ad,& 


det(A*( fc+1 ))i/p 


3. Stop if convergence criterion is satisfied. Otherwise set k —^ k + 1 ad go to step 2. 
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Unlike sign covariance matrices and symmetrized sign covariance matrices (Diimbgen 


1998), however, attempting to derive an affine equivariant counterpart (as opposed to 


only orthogonal equivariance) of the DCM through an iterative approach analogous to 


Tyler (1987) will not result in anything new. This is because Tyler’s scatter matrix T,t 


is defined as the implicit solution to the following equation: 


S7 1 = IE 


(x~ aQ(x-/j) t 
( x- /i) T S^ 1 (x- /x) 


( 3 ) 


and simply replacing x by its multivariate rank counterpart x will not change the estimate 
Tit as x and x have the same directions. Instead we consider a depth-weighted version 
of Tyler’s scatter matrix (i.e. weights (Z)x( x )) 2 in right side of ([ 3 ])) in the simulations in 
Section[5j The simulations show that it has slightly better finite-sample efficiency than Sy 
but has same asymptotic performance. We conjecture that its concentration properties 


can be obtained by taking an approach similar to Soloveychik and Wiesel (2014). 


4 Asymptotic results 

4.1 The sample DCM 

Let us now consider n iid random draws from our elliptic distribution F. say Xi, ...,X n . 
For ease of notation, denote S5(x;p) = S(x — /a)S(x — fi) T . Then, given the depth 
function and known location center /x, one can show that the vectorized form of i/n-times 
the sample DCM: 2™ = i(.Dx(xi)) 2 S'S , (xj; fi)/y/n has an asymptotic multivariate normal 
distribution with mean y / n.nec(T , [((Z)x(X)) 2 «S'5'(x; p)]) and a certain covariance matrix 
by straightforward application of the central limit theorem (CLT). But in practice the 
population depth function Dx(x) = D(x, F) is estimated by the depth function based on 
the empirical distribution function, F n . Denote this sample depth by D^-(x) = D(x,F n ). 
Here we make the following assumption regarding its relation to Dx(x): 

(D5) Uniform convergence : sup xgKP |D^(x) — Z?x(x)| —> 0 as n —> 00 . 
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The assumption that empirical depths converge uniformly at all points x to their pop¬ 
ulation versions holds under very mild conditions for several well known depth functions: 
for example projection depth (Zuo, 2003) and simplicial depth (Diimbgen 1992). One 
also needs to replace the known location parameter /x by some estimator fi n . Examples 


of robust estimators of location that are relevant here include the spatial median (Brown 


1983; Haldane, 1948), Oja median (Oja, 1983), projection median (Zuo, 2003) etc. Now, 


given D^(-) and fi n , to plug them into the sample DCM and still go through with the 
CLT we need the following result: 


Lemma 4.1. Consider a random variable XeP having a continuous and symmetric 
distribution with location center /i, such that E ||x —/i|| -3 / 2 < oo. Given n random samples 
from this distribution, suppose fj, n is an estimator of fx so that y/n(fi n — /z) = Op(l). Then 
with the above notations, and given the assumption (D5) we have 


\/n 


1 

n 


2=1 


1 

- J2(D^)) 2 SS(^ h ,) 


We are now in a position to state the result for consistency of the sample DCM: 


Theorem 4.2. Consider n iid samples from the distribution in Lemma \f.l\ Then, given 
a depth function Dx(.) and an estimate of center fi n so that y/n{jl n — /*) = Op(l), 


vec < — 
n 


^(^(x i )) 2 55(x ? ;;A, 


- E 


2=1 


vec {(D x (x)) 2 55(x;/x)} 


D 


Np2 (0, V DiS {F)) 


with Vd,s(F) = Var 


uec |(Z) x (x)) 2 S'5(x; /i)| 


In case F is elliptical, an elaborate form of the covariance matrix Vd,s{F) explic¬ 
itly specifying each of its elements (more directly those of its T T -rotated version) can 
be obtained, which is given in Appendix [Aj This form is useful when deriving limiting 
distributions of eigenvectors and eigenvalues of the sample DCM. 
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4.2 Eigenvectors and eigenvalues 


Since we are mainly interested in using the DCM for a robust version of principal com¬ 
ponents analysis, from now on we assume that the eigenvalues of £ are distinct: Ai > 
A 2 > ... > X p to obtain asymptotic distributions of principal components. In the case of 
eigenvalues with larger than 1 multiplicities, the limiting distributions of eigenprojection 


matrices can be obtained analogous to those of the sign covariance matrix (Magyar and 


Tyler 2014). 


We now derive the asymptotic joint distributions of eigenvectors and eigenvalues of the 
sample DCM. The following result allows us to get these, provided we know the limiting 
distribution of the sample DCM itself: 


Theorem 4.3. Taskinen et al., 2012) Let F\ be an elliptical distribution with a diagonal 
covariance matrix A, and C be any positive definite symmetric p x p matrix such that at 
F\ the limiting distribution of y/nvec(C — A) is ap 2 -variate (singular) normal distribution 
with mean zero. Write the spectral decomposition of C as C = PAP T . Then the limiting 
distributions of \Jnvec[P — I p ) and ^/nvec (A —A) are multivariate (singular) normal and 


\fnvec{C — A) = [(A <g) I p ) — ( I p <g) A)] \/nvec{P — I p ) + y/nvec( A — A) + op(l) (4) 

The first matrix picks only off-diagonal elements of the LHS and the second one only 
diagonal elements. We shall now use this as well as the form of the asymptotic covariance 
matrix of the vec of sample DCM, i.e. Vd,s{F ) to obtain limiting variance and covariances 
of eigenvalues and eigenvectors. 

Corollary 4.4. Consider the sample DCM S D (F) = An)/ n ant l 

its spectral decomposition S D (F) = TpApT^. Then the matrices G = s/n(T d — T) 
and L = y / n(Ap — A d,s) have independent distributions. The random variable vec(G) 
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asymptotically has a p 2 -variate normal distribution with mean 0 p 2 , and the asymptotic 
variance and covariance of different columns of G = (gi, ...,g p ) are as follows: 


AVar(gi ) 


E 


k=l]k^i 


. (A D,s,k ~ A D,S,i)' 


-E 


(D z(z)) AjA kZjZ k 

(z T Az) 2 


Ikll 


(5) 


ACov(gi,gj ) = 


(A_D,s,i - A D,S,j) 2 


E 


(Dz^yX^zfz 2 

(z T Az) 2 


ijij ; i 


( 6 ) 


where T = ( 7 1; ..., 7 p ). The vector consisting of diagonal elements of L, say 1= (li,...,l p ) T 
asymptotically has a p-variate normal distribution with mean O p and variance-covariance 
elements: 


AVar(lf) 


ACov(li, lj ) 


A 

E 


(PzM) 4 A -7 
(z T Az) 2 

(Dzfz^YXjXjzfz 2 

(z T Az) 2 


2 

D,S,i 


Xd,S,i^D, 


S,ji 


i + j 


(7) 

( 8 ) 


5 Robustness and efficiency properties 


In this section, we first obtain the influence functions of the DCM as well as its eigen¬ 
vectors and eigenvalues, which are essential to understand how much influence a sample 
point, especially an infinitesimal contamination, has on any functional on the distribution 


(Hampel et al. 1986). We also derive the asymptotic efficiencies of individual principal 


components with respect to those of the original covariance matrix and sign covariance 
matrix. Unlike affine equivariant estimators of shape, the Asymptotic Relative Efficiency 
(ARE) of eigenvectors (with respect to any other affine equivariant estimator) can not be 
simplified as a ratio of two scalar quantities dependent on only the distribution of ||z|| 


(e.g. Ollilia et al. (2003); Taskinen et al. (2012)). Finite sample efficiency of the DCM 
estimates with respect to infinitesimal contamination and heavy-tailed distributions shall 
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also be demonstrated by a simulation study. 


5.1 Influence functions 

Given any probability distribution F, the influence function of any point xo in the sample 
space X for some functional T(F ) on the distribution is defined as 

IF(xq’, T, F) = lim -(T{F e ) - T(F)) 
e ->0 e 

where F e is F with an additional mass of e at xo, i.e. F e = (1 — e)F + eA X0 ; A xo being the 
distribution with point mass at xo- When T(F) = Epg for some T-integrable function g , 
LF(x 0 ; T , F) = g(xo) — T(F). It now follows that for the DCM, 


LF(x 0 ; Cov(X), F) = (D x (x 0 )) 2 SS(x 0 ; y) - Cov(X) 


Following 


Croux and Haesbroeck 


( 2000 ), we now get the influence function of the i 


th 


column of V D = ( 7 D1 , ..., 7 DtP )\i = 1 , 

P x 

/F(x 0 ;7 Di ,F) = 7-7- 

k =w Xd ^ ~ X °’ S ’ k 


E 


1 


t =W, Ad ’^ “ 

y- \/AtAfc^o^ofc (-Pz( z o )) 2 
- A D ,s,k z o Az 0 


{lk IF (*o; Cov(X), 7 j} 7 fc 
{ 7 fc(Gx(x 0 )) 2 55 (xo; y)^ - A D ,s,i7k7ij Ik 


(9) 


where T r A _1//2 (xo — y) = zq = (zqi, zo p ) T . Clearly this influence function will be 
bounded, which indicates good robustness properties of principal components. Moreover, 
since the htped function takes small values for points close to the center of the distribution, 
it does not suffer from the inlier effect that is typical of the SCM and Tyler’s shape matrix. 
The influence function for the i th eigenvector estimates of these two matrices (say 7 ^ i 
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and 7 T j, respectively) are as follows: 


/F(x 0 ; 7 5 )i ,F) 

/F(x 0 ; 7 T)i ,F) 


\/AiAfc 


ZOi^Ok 


y 

fc=Wi A5 ’' ~ As ’ fc z o Az ° 

/ . oX V^i^k ZQiZOk 

{p + 2) y -p .^^7 fc 

^ . Ai - zl z 0 


7fc> with A 5;i = E z 




XU V 


k=l;kj^i 


for i = 1,2. In Figure [2] we consider first eigenvectors of different scatter estimates for 
the A/ 2 (( 0 , 0 ) t , diag( 2 , 1 )) and plot norms of these influence functions for different values 
of xo- The plots for SCM and Tyler’s shape matrix demonstrate the ’inlier effect’, i.e. 
points close to symmetry center and the center itself having high influence. The influence 
function for the sample covariance matrix is obtained by replacing (p + 2 ) by ||zo || 2 in the 
expression of /F(xq; 7 r ,, F) above, hence is unbounded and the corresponding eigenvector 
estimators are not robust. In comparison, all three DCMs considered here have a bounded 
influence function as well as small values of the influence function at ’deep’ points. 


5.2 Asymptotic and finite-sample efficiencies 


Suppose X is a y^-consistent estimator of the population covariance matrix X, which 
permits a spectral decomposition X = TAr T , where T = ( 77 , ..., 7 p ). Then the asymptotic 


variance of the eigenvectors are (see Theorem 13.5.1 in Anderson (3rd ed. 2003)) 


AVar^Ji) = y 


AAfc t 
ilklk 


k=l\kj^i 


(Ai - Afc) 2 


( 10 ) 


The asymptotic relative efficiencies of eigenvectors from the sample DCM with respect to 
the sample covariance matrix can now be derived using (10) above and ([5]) from Corollary 
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(a) 


<b| 



Figure 2: Plot of the norm of influence function for first eigenvector of (a) sample co- 
variance matrix, (b) SCM, (c) Tyler’s scatter matrix and DCMs for (d) Halfspace depth, 
(e) Mahalanobis depth, (f) Projection depth for a bivariate normal distribution with 
/x = 0, £ = diag(2,1) 
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53 


AREtfP^F) 


TV (AVar(y/n-y i )) 
Tr (AVar(y/n'y ?)) 


E 




(A i — Afc ) 2 


E 






(AD,s,i “ AD,S,fc) 


;E 


( (D z (z)rzfzl \ 

y (z T Az ) 2 y 


For 2 dimensions, this expression can be somewhat simplified. Suppose the two eigen¬ 
values are A and pX. In that case the eigenvalues of the DCM are 


Ad,s,i = E 


( (DzWfzf ] 

\ z l + p4 J ’ 


A D,s ,2 = E 


( (P z (z)) 2 pz% \ 

V z l + p 4 J 


and by simple algebra we get 


ARE^^-F) = ARE (^2 > 7 2 5 F) 


1 

(wF 


p ( (bzA)) 2 (~j-p4) 

V (4+p4) 

(D z (z))^ \ 

(4+p4) 2 J 



2 


For p = 0.5, Table [I] below considers 6 different elliptic distributions (namely, bivari¬ 
ate t with df = 5,6,10,15,25 and bivariate normal), and summarizes the ARE for first 
eigenvector of the SCM, Tyler’s scatter matrix and DCM for 3 choices of depth function 
(HSD-CM, MhD-CM, PD-CM: columns 3-5), as well as their depth-weighted versions 
mentioned at the end of section [3] (HSD-wCM, MhD-wCM, PD-wCM: columns 6-8). The 
SCM and Tyler’s M-estimator perform better than the sample covariance matrix only for 
a bivariate f-distribution with df = 5. Estimates based on depth-based covariance matri¬ 
ces have much better performances for all distributions, and continue to be competitive 
of the sample covariance matrix estimates as the base distribution approaches normality, 
especially those based on projection depth. Interestingly, depth-weighted versions seem 
to have better performances than their corresponding DCMs, more so for heavy-tailed 
distributions. 

We now obtain finite sample efficiencies of the three DCMs as well as their depth- 
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SCM 

Tyler 

HSD-CM 

MhD-CM 

PD-CM 

HSD-wCM 

MhD-wCM 

PD-wCM 

Bivariate £5 

1.46 

1.50 

1.97 

1.54 

1.99 

2.09 

1.57 

2.11 

Bivariate te 

0.97 

1.00 

1.37 

1.12 

1.40 

1.45 

1.19 

1.49 

Bivariate £10 

0.65 

0.67 

0.96 

0.88 

1.02 

1.02 

0.94 

1.10 

Bivariate £15 

0.57 

0.59 

0.87 

0.82 

0.94 

0.92 

0.87 

1.00 

Bivariate £25 

0.54 

0.55 

0.79 

0.75 

0.88 

0.85 

0.81 

0.95 

BVN 

0.49 

0.50 

0.73 

0.71 

0.82 

0.77 

0.75 

0.88 


Table 1: Asymptotic efficiencies relative to sample covariance matrix for p = 2 


weighted affine equivariant counterparts by a simulation study, and compare them with 
the same from the SCM and Tyler’s scatter matrix. We consider the same 6 elliptical 
distributions considered in ARE calculations above, and from every distribution draw 
10000 samples each for sample sizes n = 20, 50,100, 300, 500. All distributions are centered 
at 0 p , and have covariance matrix £ = diag(p,p — 1, ...1). We consider 3 choices of p: 2, 
3 and 4. 


We use the concept of principal angles (Miao and Ben-Israel, 1992) to find out error 
estimates for the first eigenvector of a scatter matrix. In our case, the first eigenvector 
will be 

v- 1 

7r = (1-C^) T 


For an estimate of the eigenvector, say qq, error in prediction is measured by the smallest 
angle between the two lines, i.e. cos^ 1 Iqf’qql. A smaller absolute value of this angle 
is equivalent to better prediction. We repeat this 10000 times and calculate the Mean 
Squared Prediction Angle: 


MSPAtfJ 


1 

10000 


10000 


£ 

m =1 



T ~ (m) 
iii i 


2 


Finally, the finite sample efficiency of some eigenvector estimate •yf relative to that ob¬ 
tained from the sample covariance matrix, say y^ ov is obtained as: 


FSE( 7f,7f°") 


MSPA( i^ ov ) 
MSPA(yf) 
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Tables S! and [4] give FSE values for p = 2,3,4, respectively. In general, all the effi¬ 
ciencies increase as the dimension p goes up. DCM-based estimators (columns 3-5 in each 
table) outperform SCM and Tyler’s scatter matrix, and among the 3 depths considered, 
projection depth seems to give the best results. Its finite sample performances are better 
than Tyler’s and Huber’s M-estimators of scatter as well as their symmetrized counter¬ 


parts (see Table 4 in Sirkia et al. (2007), and quite close to the affine equivariant spatial 


sign covariance matrix (see Table 2 in Ollilia et al. (2003)) For p = 2, n = 300, 500 the first 
5 columns of Table [2] approximate the asymptotic efficiencies in Table [l] well, except for 
the multivariate f-distribution with df = 5. Finally, the depth-weighted iterated versions 
of these 3 SCMs (columns 6-8 in each table) seem to further better the performance of 
their corresponding orthogonal equivariant counterparts. 


6 Examples in real data analysis 


6.1 Bus data 


This dataset is available in the R package rrcov, and consists of data on images of 218 
buses. The 18 variables here correspond to several features related to these images. Here 


we extend upon the analysis in Maronna et al. (2006), pp. 213 to compare the classical 


PCA and 3 different methods of robust PCA (including that using SCM) with our DCM- 
based method. Similar to the original analysis, we set aside variable 9 and scale the other 
variables by dividing with their respective median absolute deviations (MAD). This is 
done because all the variables had much larger standard deviations compared to their 
MADs, and variable 9 had MAD = 0. 

For the sake of uniformity, we use projection depth as our fixed depth function while 
doing depth-based PCA in our data analysis examples. We compare the outputs of classi¬ 
cal (CPCA) and depth-based PCA (DPCA) with the following 3 robust methods: Spher¬ 
ical PCA, i.e. PCA based on the SCM (SPCA), PCs obtained by the ROBPCA algo¬ 


rithm (Hubert et al. 2005), and the Minimum Covariance Determinant (MCD) estimator 
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F = Bivariate t$ 

SCM 

Tyler 

HSD-CM 

MhD-CM 

PD-CM 

HSD-wCM 

MhD-wCM 

PD-wCM 

n =20 

0.80 

0.83 

0.95 

0.95 

0.89 

1.00 

0.96 

0.89 

n =50 

0.86 

0.90 

1.25 

1.10 

1.21 

1.32 

1.13 

1.25 

n=100 

1.02 

1.04 

1.58 

1.20 

1.54 

1.67 

1.24 

1.63 

n=300 

1.24 

1.28 

1.81 

1.36 

1.82 

1.93 

1.44 

1.95 

n=500 

1.25 

1.29 

1.80 

1.33 

1.84 

1.91 

1.39 

1.97 

F = Bivariate te 

SCM 

Tyler 

HSD-CM 

MhD-CM 

PD-CM 

HSD-wCM 

MhD-wCM 

PD-wCM 

n =20 

0.77 

0.79 

0.92 

0.92 

0.86 

0.96 

0.92 

0.85 

n=50 

0.76 

0.78 

1.11 

1.00 

1.08 

1.17 

1.03 

1.13 

n= 100 

0.78 

0.79 

1.27 

1.06 

1.33 

1.35 

1.11 

1.41 

n=300 

0.88 

0.91 

1.29 

1.09 

1.35 

1.38 

1.15 

1.45 

n=500 

0.93 

0.96 

1.37 

1.13 

1.40 

1.44 

1.19 

1.48 

F = Bivariate tio 

SCM 

Tyler 

HSD-CM 

MhD-CM 

PD-CM 

HSD-wCM 

MhD-wCM 

PD-wCM 

n =20 

0.70 

0.72 

0.83 

0.84 

0.77 

0.89 

0.87 

0.79 

n =50 

0.58 

0.60 

0.90 

0.84 

0.86 

0.95 

0.88 

0.91 

n=100 

0.57 

0.59 

0.92 

0.87 

0.97 

0.98 

0.90 

1.03 

n=300 

0.62 

0.64 

0.93 

0.85 

0.99 

0.99 

0.91 

1.06 

n=500 

0.62 

0.65 

0.93 

0.86 

1.00 

1.00 

0.92 

1.08 

F = Bivariate tis 

SCM 

Tyler 

HSD-CM 

MhD-CM 

PD-CM 

HSD-wCM 

MhD-wCM 

PD-wCM 

n =20 

0.63 

0.66 

0.76 

0.78 

0.72 

0.81 

0.81 

0.73 

n=50 

0.52 

0.52 

0.79 

0.75 

0.80 

0.84 

0.79 

0.85 

n=100 

0.51 

0.52 

0.83 

0.77 

0.88 

0.88 

0.81 

0.94 

n=300 

0.55 

0.56 

0.84 

0.79 

0.91 

0.89 

0.84 

0.98 

n=500 

0.56 

0.59 

0.85 

0.80 

0.93 

0.91 

0.86 

0.99 

F = Bivariate £25 

SCM 

Tyler 

HSD-CM 

MhD-CM 

PD-CM 

HSD-wCM 

MhD-wCM 

PD-wCM 

n=20 

0.63 

0.65 

0.77 

0.79 

0.74 

0.80 

0.81 

0.74 

n=50 

0.49 

0.50 

0.73 

0.71 

0.76 

0.78 

0.75 

0.80 

n=100 

0.45 

0.46 

0.73 

0.69 

0.81 

0.78 

0.73 

0.87 

n=300 

0.51 

0.52 

0.78 

0.75 

0.87 

0.83 

0.79 

0.94 

n=500 

0.53 

0.55 

0.79 

0.75 

0.87 

0.84 

0.80 

0.94 

F = BVN 

SCM 

Tyler 

HSD-CM 

MhD-CM 

PD-CM 

HSD-wCM 

MhD-wCM 

PD-wCM 

n =20 

0.56 

0.60 

0.69 

0.71 

0.67 

0.73 

0.74 

0.68 

n =50 

0.42 

0.43 

0.66 

0.66 

0.70 

0.71 

0.69 

0.75 

n=100 

0.42 

0.43 

0.69 

0.66 

0.77 

0.74 

0.71 

0.83 

n=300 

0.47 

0.49 

0.71 

0.69 

0.82 

0.76 

0.73 

0.88 

n=500 

0.48 

0.50 

0.73 

0.71 

0.83 

0.78 

0.76 

0.89 


Table 2: Finite sample efficiencies of several scatter matrices: p = 2 
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3-variate ts 

SCM 

Tyler 

HSD-CM 

MhD-CM 

PD-CM 

HSD-wCM 

MhD-wCM 

PD-wCM 

n =20 

0.96 

0.97 

1.06 

1.03 

0.99 

1.07 

1.06 

0.97 

O 

IlO 

II 

e 

1.07 

1.08 

1.28 

1.20 

1.18 

1.33 

1.23 

1.20 

n =100 

1.12 

1.15 

1.49 

1.31 

1.40 

1.57 

1.38 

1.48 

n=300 

1.49 

1.54 

2.09 

1.82 

2.07 

2.19 

1.93 

2.18 

n=500 

1.60 

1.66 

2.18 

1.87 

2.21 

2.27 

1.95 

2.30 

3-variate t& 

SCM 

Tyler 

HSD-CM 

MhD-CM 

PD-CM 

HSD-wCM 

MhD-wCM 

PD-wCM 

n =20 

0.90 

0.92 

1.00 

0.99 

0.95 

1.02 

1.01 

0.94 

O 

O 

II 

e 

0.95 

0.96 

1.16 

1.09 

1.09 

1.21 

1.14 

1.11 

n =100 

0.98 

0.99 

1.32 

1.22 

1.25 

1.38 

1.27 

1.29 

n=300 

1.10 

1.14 

1.57 

1.40 

1.58 

1.62 

1.47 

1.64 

n=500 

1.17 

1.20 

1.57 

1.43 

1.60 

1.63 

1.51 

1.67 

3-variate t io 

SCM 

Tyler 

HSD-CM 

MhD-CM 

PD-CM 

HSD-wCM 

MhD-wCM 

PD-wCM 

n =20 

0.87 

0.88 

0.95 

0.94 

0.90 

0.97 

0.98 

0.89 

O 

to 

II 

e 

0.77 

0.79 

0.96 

0.92 

0.94 

0.99 

0.96 

0.95 

n =100 

0.75 

0.76 

1.02 

0.95 

1.01 

1.06 

1.00 

1.05 

n=300 

0.73 

0.75 

1.03 

0.98 

1.10 

1.08 

1.03 

1.15 

n=500 

0.73 

0.76 

1.02 

0.98 

1.09 

1.06 

1.02 

1.14 

3-variate 1 15 

SCM 

Tyler 

HSD-CM 

MhD-CM 

PD-CM 

HSD-wCM 

MhD-wCM 

PD-wCM 

n =20 

0.84 

0.86 

0.92 

0.92 

0.89 

0.94 

0.94 

0.87 

O 

to 

II 

s 

0.75 

0.76 

0.92 

0.90 

0.90 

0.96 

0.94 

0.93 

n =100 

0.66 

0.67 

0.91 

0.87 

0.95 

0.96 

0.92 

1.00 

n=300 

0.61 

0.64 

0.90 

0.87 

1.00 

0.93 

0.91 

1.04 

n=500 

0.65 

0.67 

0.89 

0.87 

0.99 

0.93 

0.91 

1.03 

3-variate £25 

SCM 

Tyler 

HSD-CM 

MhD-CM 

PD-CM 

HSD-wCM 

MhD-wCM 

PD-wCM 

n =20 

0.78 

0.79 

0.87 

0.89 

0.87 

0.89 

0.92 

0.86 

O 

to 

II 

e 

0.70 

0.71 

0.88 

0.86 

0.88 

0.91 

0.90 

0.90 

n =100 

0.61 

0.63 

0.86 

0.83 

0.89 

0.90 

0.88 

0.94 

n=300 

0.58 

0.59 

0.83 

0.80 

0.92 

0.87 

0.85 

0.98 

n=500 

0.62 

0.64 

0.83 

0.82 

0.94 

0.88 

0.87 

0.99 

3-variate Normal 

SCM 

Tyler 

HSD-CM 

MhD-CM 

PD-CM 

HSD-wCM 

MhD-wCM 

PD-wCM 

n =20 

0.76 

0.78 

0.85 

0.87 

0.84 

0.87 

0.90 

0.83 

O 

to 

II 

e 

0.66 

0.67 

0.82 

0.81 

0.84 

0.86 

0.86 

0.86 

71=100 

0.56 

0.58 

0.77 

0.75 

0.83 

0.82 

0.79 

0.87 

n=300 

0.53 

0.55 

0.75 

0.74 

0.85 

0.79 

0.78 

0.90 

71=500 

0.56 

0.58 

0.76 

0.76 

0.87 

0.80 

0.80 

0.92 


Table 3: Finite sample efficiencies of several scatter matrices: p = 3 
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4-variate ts 

SCM 

Tyler 

HSD-CM 

MhD-CM 

PD-CM 

HSD-wCM 

MhD-wCM 

PD-wCM 

n=20 

1.04 

1.02 

1.10 

1.07 

1.02 

1.09 

1.07 

0.98 

O 

IlO 

II 

£ 

1.08 

1.08 

1.16 

1.16 

1.13 

1.19 

1.19 

1.13 

n=100 

1.31 

1.31 

1.42 

1.38 

1.36 

1.46 

1.44 

1.36 

n=300 

1.46 

1.54 

1.81 

1.76 

1.95 

1.88 

1.88 

1.95 

n=500 

1.92 

1.93 

2.23 

2.03 

2.31 

2.35 

2.19 

2.39 

4-variate t& 

SCM 

Tyler 

HSD-CM 

MhD-CM 

PD-CM 

HSD-wCM 

MhD-wCM 

PD-wCM 

n=20 

1.00 

1.05 

1.03 

1.05 

1.00 

1.04 

1.04 

0.95 

O 

O 

II 

£ 

1.03 

1.01 

1.13 

1.12 

1.11 

1.19 

1.17 

1.10 

n=100 

1.08 

1.12 

1.25 

1.23 

1.27 

1.24 

1.25 

1.22 

n=300 

1.34 

1.36 

1.64 

1.52 

1.60 

1.67 

1.61 

1.68 

n=500 

1.26 

1.34 

1.55 

1.49 

1.60 

1.65 

1.61 

1.69 

4-variate t io 

SCM 

Tyler 

HSD-CM 

MhD-CM 

PD-CM 

HSD-wCM 

MhD-wCM 

PD-wCM 

n=20 

0.90 

0.89 

0.95 

0.98 

0.98 

0.96 

1.01 

0.95 

O 

to 

II 

£ 

0.90 

0.91 

1.01 

0.98 

0.98 

1.03 

1.04 

0.99 

n=100 

0.87 

0.87 

0.93 

0.95 

1.01 

0.99 

1.01 

1.05 

n=300 

0.87 

0.87 

1.09 

1.09 

1.17 

1.14 

1.16 

1.23 

n=500 

0.88 

0.92 

1.10 

1.10 

1.23 

1.19 

1.22 

1.29 

4-variate 1 15 

SCM 

Tyler 

HSD-CM 

MhD-CM 

PD-CM 

HSD-wCM 

MhD-wCM 

PD-wCM 

n=20 

0.92 

0.90 

0.94 

0.94 

0.96 

0.95 

0.97 

0.89 

O 

to 

II 

£ 

0.82 

0.83 

0.88 

0.91 

0.93 

0.88 

0.93 

0.93 

n=100 

0.84 

0.87 

0.92 

0.95 

1.00 

0.93 

0.96 

1.00 

n=300 

0.73 

0.75 

0.96 

0.99 

1.10 

1.00 

1.06 

1.12 

n=500 

0.73 

0.76 

0.95 

0.96 

1.06 

0.94 

0.97 

1.06 

4-variate £25 

SCM 

Tyler 

HSD-CM 

MhD-CM 

PD-CM 

HSD-wCM 

MhD-wCM 

PD-wCM 

n=20 

0.89 

0.92 

0.92 

0.92 

0.90 

0.96 

0.95 

0.89 

O 

to 

II 

£ 

0.82 

0.84 

0.89 

0.90 

0.91 

0.93 

0.96 

0.92 

n=100 

0.77 

0.76 

0.90 

0.90 

0.96 

0.94 

0.98 

1.04 

n=300 

0.73 

0.77 

0.93 

0.91 

0.98 

1.00 

0.98 

1.03 

n=500 

0.67 

0.71 

0.83 

0.83 

0.96 

0.88 

0.90 

1.00 

4-variate Normal 

SCM 

Tyler 

HSD-CM 

MhD-CM 

PD-CM 

HSD-wCM 

MhD-wCM 

PD-wCM 

n=20 

0.82 

0.84 

0.87 

0.90 

0.91 

0.89 

0.93 

0.89 

O 

to 

II 

£ 

0.80 

0.81 

0.87 

0.88 

0.88 

0.88 

0.92 

0.88 

71=100 

0.68 

0.71 

0.80 

0.85 

0.91 

0.82 

0.86 

0.92 

n=300 

0.61 

0.63 

0.82 

0.85 

0.93 

0.86 

0.91 

0.96 

71=500 

0.60 

0.64 

0.77 

0.80 

0.90 

0.82 

0.86 

0.96 


Table 4: Finite sample efficiencies of several scatter matrices: p = 4 
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<1 

Method of PCA 


CPCA 

SPCA 

ROBPCA 

MPCA 

DPCA 

1 

0.188 

0.549 

0.410 

0.514 

0.662 

2 

0.084 

0.272 

0.214 

0.337 

0.359 

3 

0.044 

0.182 

0.121 

0.227 

0.237 

4 

0.026 

0.135 

0.083 

0.154 

0.173 

5 

0.018 

0.099 

0.054 

0.098 

0.115 

6 

0.012 

0.069 

0.036 

0.070 

0.084 


Table 5: Unexplained proportions of variability by PCA models with q components for 
bus data 


(Rousseeuw 1984) (MPCA). Table [5] gives the proportions of variability that are left un¬ 
explained after the top q (= 1, ...,6) components are taken into account in each of the 5 
methods. The first PC in classical PCA seems to explain a much higher proportion of 


variability in original data than robust methods. However, as noted in Maronna et al. 


(2006) and Hubert et al. (2005), this is a result of the classical variances being inflated 
due to outliers in the direction of the first principal axis. Among the robust methods, the 
proportions of unexplained variances are highest for DCM-based PCA for all values of q. 

Table [6] demonstrates why robust methods actually give a better representation of the 
underlying data structure than classical PCA here. Each of its column lists different quan¬ 
tiles of the squared orthogonal distance for a sample point from the hyperplane formed 
by top 3 PCs estimated by the corresponding method. For PCA based on projection- 
DCM, the estimated principal component subspaces are closer to the data than CPCA 
for more than 90% of samples, and the distance only becomes larger for higher quantiles. 
This means that for CPCA, estimated basis vectors of the hyperspace get pulled by ex¬ 
treme outlying points, while the influence of these outliers is very low for DPCA. SPCA 
and ROBPCA perform very closely in this respect, the percentage of points that have 
less squared distance than CPCA being between 80% and 90% for both of them. This 
percentage is only 50% for MPCA, which suggests that the corresponding 3-dimensional 
subspace estimated by MCD is possibly not an accurate representation of the truth. 
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Quantile 

Method of PCA 

CPCA 

SPCA 

ROBPCA 

MPCA 

DPCA 

10% 

1.9 

1.2 

1.2 

1.0 

1.2 

20% 

2.3 

1.6 

1.6 

1.3 

1.6 

30% 

2.8 

1.8 

1.8 

1.7 

1.9 

40% 

3.2 

2.2 

2.1 

2.1 

2.3 

50% 

3.7 

2.6 

2.5 

3.1 

2.6 

60% 

4.4 

3.1 

3.0 

5.9 

3.2 

70% 

5.4 

3.8 

3.9 

25.1 

3.9 

80% 

6.5 

5.2 

4.8 

86.1 

4.8 

90% 

8.2 

9.0 

10.9 

298.2 

6.9 

Max 

24 

1037 

1055 

1037 

980 


Table 6: Quantiles to squared distance from 3-principal component hyperplanes for bus 
data 


6.2 Octane data 


We now apply our method to a high-dimensional dataset and demonstrate its effectiveness 


in outlier detection. Due to Esbensen et al. (1994), this dataset consists of 226 variables 


and 39 observations. Each observation is a gasoline sample with a certain octane number, 
and have their NIR absorbance spectra measured in 2 nm intervals between 1100 - 1550 
nrn. There are 6 outliers here: compounds 25, 26 and 36-39, which contain alcohol. 
Proportions of explained variability by PCs obtained by the two methods are given in 
Figure [3j Once again, the first PC in CPCA explains a much larger proportion of variance 
than DPCA. Second and third PCs obtained by DPCA explain higher proportions of 
variability than the corresponding components in CPCA. 

In both methods, the first two PCs explains a large amount (98% for CPCA, 89% for 
DPCA) of underlying variability. But in DPCA, these PCs are more effective in detecting 
outliers, which we demonstrate in Figure |4j For any method of PCA with k components 
on a dataset of n observations and p variables, the score distance (SD) and orthogonal 
distance (OD) for i th observation (i = 1,2, ...,n) are defined as: 


SDi = 


\ 


k a 2 

\ " % 




ODi = ||xj — Ps i 
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Figure 3: Explained variance proportions for two types of PCA on octane data 


where S = (si,... ,s n ) T is the n x k scoring matrix, P the p x k loading matrix, and 
Ai, ..., Afc are eigenvalues obtained from the PCA, and xi,..., x n are the n observation 
vectors. We only consider the first 2 PCs here, so k is set to 2. From a practical standpoint, 
SD{ can be interpreted as a weighted norm of the projection of the i th point on the 
hyperplane formed by first k principal components, and ODi the orthogonal distance 


of point i from that hyperplane. For outlier detection, following Hubert et al. (2005) 


we set the upper cutoff values for score distances at yjx I ,.975 aR d orthogonal distances 
at [median(0Z) 2 / 3 ) + MAD(CAD 2 / 3 )$ _1 (0.975)] 3 / 2 , where <h(.) is the standard normal 
cumulative distribution function. These cutoffs are marked by red lines in the diagnostic 
plots in figure [4j In the figure we see that CPC A detects only 1 of the 6 outliers (point 
26, left panel). Compared to this the diagnostic plot corresponding to DPCA in the 
right panel correctly detects all 6 outlying points. All of them have larger score distances 
than the score cutoff value, while points 25, 26 and 38 have higher-than-cutoff orthogonal 
distances as well. Interestingly, the arrangement of outlying points here is similar to that 
in the diagnostic plot corresponding to ROBPCA, which can be found in figure 5(b) of 


Hubert et al. (2005) 
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Figure 4: Distance plots for two types of PCA on octane data 


7 Conclusion 


In the above sections we introduce a covariance matrix based on depth-based multivari¬ 
ate ranks that keeps the eigenvectors of the actual population unchanged for elliptical 
distributions. We provide asymptotic results for the sample DCM, its eigenvalues and 
eigenvectors. Bounded influence functions as well as simulation studies suggest that the 
eigenvector estimates obtained from the DCM are highly robust, yet do not lose much in 
terms of efficiency. Thus it provides a plausible alternative to existing approaches of ro¬ 
bust PCA that are based on estimation of covariance matrices (for example SCM, Tyler’s 
scatter matrix, Diimbgen’s symmetrized shape matrix). 

An immediate extension of this would be to study the depth-weighted iterated scatter 
matrices, i.e. matrices T,£> w that are solution to the following type of equations: 


£Dw = E 


(j)x(x)) 2 (x-/i)(x-/x) :] 
(x - /i) T £-px - /i) 


as their eigenvector estimates seem to have better efficiency than those obtained from the 
corresponding DCMs. Unlike the DCM, these matrices will possess the affine equivariance 
property. It is possible to develop tests for central and elliptic symmetry based on the 
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decomposition of the multivariate rank vector in equation [l] The result in Theorem 3.1 


is based on the fact that T(X) = 0, and it holds for any centrally symmetric underlying 
distribution. Moreover, the depth of a point in the standardized scale (i.e. z-scale) does 
not depend on the direction of z. This is not possible without circular symmetry of z, so 
any test of independence between D z {z) and S(z) can be seen as a test of ellipticity for the 
original random variable X. Finally the applicability of this procedure in high-dimensional 
and functional data remains to be explored. 


Appendix 

A Form of Vd,s{F) 


First observe that for F having covariance matrix £ = TAr^, 


v d , s (f) = (r ® t)v d , s (f a )(t ® r) T 


where F\ has the same elliptic distribution as F. but with covariance matrix A. Now, 

(D z (z)) 2 A 1 / 2 zz T A i/ 2 1 T f (D z ( z)) 2 A 1 / 2 zz t A i/ 2 a 

"A D,s}vec t -TtAA- 


V d ,s(F a) = E 
= E 


vec ■ 


^ z T Az ’ J | z r Az 

vec {pz(z)) 2 55(A 1 / 2 z;0)} vec T {( j D z (z)) 2 55(A 1 / 2 z;0)} 
- vec(A D)S )vec T (A D)S ) 


The matrix vec(Ar> t s)vec T (A d^s) consists of elements Xi\j at ( i,j) th position of the 
block, and 0 otherwise. These positions correspond to variance and covariance 
components of on-diagonal elements. For the expectation matrix, all its elements are 
of the form E[\/XaXbX c XdZ a ZbZ c Zd-(Dz{z)) 4 /(z T Az) 2 ], with 1 < a,b,c,d < p. Since 
(D z(z)) 4 /(z t Az ) 2 is even in z, which has a circularly symmetric distribution, all such 
expectations will be 0 unless a = b = c = d, or they are pairwise equal. Following a 


similar derivation for spatial sign covariance matrices in Magyar and Tyler (2014), we 
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collect the non-zero elements and write the matrix of expectations: 


(vp v 'l p p 

(I p 2+Kp^p) EE 7 ab(e a el 0 e b e b) ~ J2^a( e a ea ®e a ea) >+^2^2 labial ®e a el) 

La=16=l a =1 J a=l b=l 

where Ik = (ei,..., e*,), K m ,. n = Sj =i Jij 0 Jfj with .J t j the mxn matrix having 1 as 

( i,j) th element and 0 elsewhere, and 7^ n = £’[A m A n z^2:^.(i)z(z)) 4 /(z T Az) 2 ]: 1 < m, n < 
In¬ 
putting everything together, denote S D (F \) = Y^'i=i{Dz( z i)) 2 SS£i n )/n. Then 
the different types of elements in the matrix Vd,s(Ta) are as given below (1 < a, 6, c, d < 

p): 

• Variance of on-diagonal elements 

AVar(yfrSg(F A )) = E ~ ^D,s,a 

• Variance of off-diagonal elements (a ^ b) 

A w ( / cfl/ 77 P (D Z (z)) 4 \ a \ b zZz% 

AVar{VnS ab {F A )) = E - ( z r Az ) 2 - 

• Covariance of two on-diagonal elements (a ^ b) 

ACov(^nSaa(F A ),y/nS bb (F\)) = E ^ ~ Z ^r Az ^2 & ° ^ ~ ^D,S,a^D,S,b 

• Covariance of two off-diagonal elements (a / b / c / d) 

ACov(V^S°{F a ),^S°{Fk)) = 0 
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Covariance of one off-diagonal and one on-diagonal element (o / 6 / c) 


ACov{yfr&2,{F A ),yfc§g(F A )) = 0 


B Proofs 


Proof of Theorem 3.1. The proof follows directly from writing out the expression of CovfX .): 


Con(X) = E(XX r ) - E(X)E(X) 

11 7 11 2 
\2 ll z ll 


= T.E 


= T.E 


(D z (z)Y 


Pz(z)) : 


HA 1 / 2 ! 


a 1/2 s(z)s(z) t a 1 / 2 r T - o r) o? 


J P' J P 


i A 1 / 2 zz r A 1 / 2 
z T Az 


□ 


Proof of Lemma f.l. For two positive definite matrices A,B, we denote by A > B that 


A — B is positive definite. Also, denote 


S n = \fn 


n 


^|(^(x,)) 2 -( J Dx(x l )) 2 55(x i ;AJ 


2= 1 


Now due to the assumption of uniform convergence, given e > 0 we can find JVgK such 
that 


|( J D” 1 (x l )) 2 -( J D x (x l )) 2 |<6 


(B.l) 


for all n\ > N;* = 1,2,..., rzi- This implies 


Sni < ey/ni 
= ey/nf 


-i '"l 

-E S5 ( Xl ^»i 

‘'l ■ n 

-| ^T -i , *'± 

- ^2 {SSiptii fi ni ) - SS{xi-,n )} + — ^2sS(xi;ij,) 


n i * .. 

2=1 


2=1 


(B.2) 


We now construct a sequence of positive definite matrices {A^B^ + Cjf) : k E N} so 
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that 


f Nfc 

A k = \, B k =^N~ k ^Y^{SS(*i-fi Nk ) - SS{xi\fj)} 


C k = y/Nk 


2=1 


N k 


N k 


E‘ S 'S , (x i ;/x) 


2=1 


where N k E N gives the relation (B.l) in place of N when we take e = l/k. Under 


conditions _E7||x — /x|| 3 / 2 < oo and y/n(fi n — //,) = Op( 1), the sample SCM with unknown 
location parameter / i n has the same asymptotic distribution as the SCM with known 


location /x (Dlirre et al., 2014), hence B k = op(l), thus A k (B k + C k ) —> 0. 


Now (B.2) implies that for any e± > 0, Sxv fe > ei =>• A k (B k + C k ) > ei, which means 


P(S]y k > ei) < P(A k (B k + C k ) > ei). Hence the subsequence {S'tvj.} -> 0. Since the main 

p 

sequence {S k } is bounded below by 0, this implies {S k } —> 0. Finally, we have that 

' n i 71 

- ^(^(x l )) 2 5S(x l ; A n ) - - ^( J Dx(x i )) 2 55(x i ; /*) 

2=1 

- V i' S ' S '( X b An) - SS(xf, fj,)} 

71 LJ 


2=1 


< 


S n + Vn 


2=1 


(B.3) 


Since the second summand on the right hand side is op(l) due to Diirre et al. (2014) as 
mentioned before, we have the needed. □ 


Proof of Theorem f.2. The quantity in the statement of the theorem can be broken down 


as: 


vec < — 
n 


E^x( x «)) 2 ^(x j; An) f _ vec I ~ E (Bx(xi)) 2 SS(xi; /x) 


2=1 


vec < 


2=1 


+ 


n 


1 

E(A ) x(xj)) 2 <S'S'(x i ;/x) \-E vec |(L> x (x)) 2 5S’(x; /x)| 


2=1 


The first part goes to 0 in probability by Lemma 4.1, and applying Slutsky’s theorem we 
get the required convergence. □ 


Proof of Theorem f.S. See Taskinen et al. (2012) 


□ 


Proof of Corollary\4-.4\ I n spirit, this corollary is similar to Theorem 13.5.1 in Anderson 
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(3rd ed. 2003), and indeed, Taskinen et al. (2012) used this theorem to prove Theorem 


4.3 Due to the decomposition Q we have, for the distribution F\, the following relation 
between any off-diagonal element of S D (F\) and the corresponding element in the estimate 
of eigenvectors Tjj(F a): 


VniD,ij(F A ) = sfn- 


S^(Fa) 


A D,S,i - A D,s,j 

So that for eigenvector estimates of the original F we have 




— li) = VnT{^ Di {F K )-ei) = vn 


lD,ik( F A)lk + {id,u{Fk) - 1)7 

k=l\k^i 

(B.4) 


VF{id,u(Fa) - 1 ) = op( 1 ) and ACov(^nS^(F\), aJuS^Fa)) = 0 for k / l , so the 
above equation implies 

AVar(gi) = AVar(Vn('y Dji - 7J) = ^ —-—-- y^lklk 

k=l;k^=i ^ D,S ’ 1 D,S,k) 


For the covariance terms, from (B.4) we get, for i 7 ^ j, 


ACov{ gi, gj) = ACov{y/n{^ D)i - 7J, Vn(l D ,j ~ 7 j)) 

( p p \ 

E VniDMFAhki ^2 Vn'jD,jk(F A )'y k I 

k=l;k^i k=l\k^j J 

= ACov (Vn A / D ,ij{F A )'Y j , y/rn D ,ji{F\)^i) 

AVar(y/nSF( A)) T 

(A D,s,i - A D,S,j) 2 1 

The exact forms given in the statement of the corollary now follows from the Form of 
Vd,s in Appendix [Aj 


31 













For the on-diagonal elements of S D (F A ) Theorem 
for i = 1, Hence 

AVar(li) = AVar(^/n\D, s ,i ~ Vn\D,s,i) 

= AVar(v / n\ D , S! i(F A ) - VnX D ,s,i(F A )) 

= AVar(V^S°(F A )) 

A similar derivation gives the expression for AVar(li,lj);i 7 ^ j. Finally, since the 
asymptotic covariance between an on-diagonal and an off-diagonal element of S D (F A ), it 
follows that the elements of G and diagonal elements of L are independent. □ 
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